Lecture 3

Temporal Models and Smoothing

Sara Martino

Dept. of Mathematical Science, NTNU

Janine Illian

University of Glasgow

Jafet Belmont

University of Glasgow

Motivation

Data are often observed in time, and time dependence is often expected.

  • Observations are correlated in time

Motivation

  1. Smoothing of the time effect

What is our goal?

  1. Smoothing of the time effect

Note: We can use the same model to smooth covariate effects!

What is our goal?

  1. Smoothing of the time effect

  2. Prediction

We can “predict” any unobserved data, does not have to be in the future

Modeling time with INLA

Time can be indexed over a

  • Discrete domain (e.g., years)

  • Continuous domain

Modeling time with INLA

Time can be indexed over a

  • Discrete domain (e.g., years)

    • Main models: RW1, RW2 and AR1

    • Note: RW1 and RW2 are also used for smoothing covariates

  • Continuous domain

    • Here we use the so-called SPDE-approach

Discrete time modelling

Example - Height of Lake Erie in time

Goal we want understand the pattern and predict into the future

Random Walk models

Random walk models encourage the mean of the linear predictor to vary gradually over time.

They do this by assuming that, on average, the time effect at each point is the mean of the effect at the neighboring points.

  • Random Walk of order 1 (RW1) we take the two nearest neighbors

  • Random Walk of order 2 (RW2) we take the four nearest neighbors

Random walks of order 1

Idea: \(\longrightarrow\ u_t = \text{mean}(u_{t-1} , u_{t+1}) + \text{Gaussian error with precision } \tau\)

Definition

\[ \pi(\mathbf{u} \mid \tau) \propto \exp\!\left( -\frac{\tau}{2} \sum_{t=1}^{T-1} (u_{t+1} - u_t)^2 \right) = \exp\!\left(-\tfrac{1}{2} \, \mathbf{u}^{\top} \mathbf{Q}\ \mathbf{u}\right) \]

\[ \mathbf{Q} = \tau \begin{bmatrix} 1 & -1 & & & & \\ -1 & 2 & -1 & & & \\ & & \ddots & \ddots & \ddots & \\ & & & -1 & 2 & -1 \\ & & & & -1 & 1 \end{bmatrix} \]

Random walks of order 1

Idea: \(\longrightarrow\ u_t = \text{mean}(u_{t-1} , u_{t+1}) + \text{Gaussian error with precision } \tau\)

Definition

\[ \pi(\mathbf{u} \mid \tau) \propto \exp\!\left( -\frac{\tau}{2} \sum_{t=1}^{T-1} (u_{t+1} - u_t)^2 \right) = \exp\!\left(-\tfrac{1}{2} \, \mathbf{u}^{\top} \mathbf{Q}\ \mathbf{u}\right) \]

  1. Role of the precision parameter \(\tau\) and prior distribution
  2. RW as intrinsic model

What is the role of the precision parameter?

  • \(\tau\) says how much \(u_t\) can vary around its mean

    • Small \(\tau\) \(\rightarrow\) large variation \(\rightarrow\) less smooth effect
    • Large \(\tau\) \(\rightarrow\) small variation \(\rightarrow\) smoother effect

We need to set a prior distribution for \(\tau\).

A common option is the so called PC-priors

Penalized Complexity (PC) priors

  • PC prior are easily available in inlabru for many model parameters
  • They are build with two principle in mind:

    1. The prior discourages overdispersion by penalizing distance from a base model

  • A line is the base model

  • We want to penalize more complex models

Penalized Complexity (PC) priors

  • PC prior are easily available in inlabru for many model parameters

  • They are build with two principle in mind:

    1. The prior discourages overdispersion by penalizing distance from a base model
    2. User-defined scaling

\[ \begin{eqnarray} \sigma = \sqrt{1/\tau} \\ \text{Prob}(\sigma>U) = \alpha;\\ \qquad U>0, \ \alpha \in (0,1) \end{eqnarray} \]

  • \(U\) an upper limit for the standard deviation and \(\alpha\) a small probability.

  • \(U\) a likely value for the standard deviation and \(\alpha=0.5\).

Example

The Model

\[ \begin{aligned} y_i|\eta_i, \sigma^2 & \sim \mathcal{N}(\eta_i,\sigma^2)\\ \eta_i & = \beta_0 + f(t_i)\\ f(t_1),f(t_2),\dots,f(t_n) &\sim \text{RW2}(\tau) \end{aligned} \]

The code

cmp = ~ Intercept(1) + 
  time(year, model = "rw1",
       hyper = list(prec = 
                      list(prior = "pc.prec",
                           param = c(0.5, 0.5))))

RW as intrinsic models

RW1 defines differences, not absolute levels:

  • Only the changes between neighboring terms are modeled.

  • Mathematically, \[ (u_1,\dots,u_n)\text{ and }(u_1+a,\dots,u_n+a) \] produce identical likelihoods — they’re indistinguishable.

This means:

  • If \(u_t\sim\text{RW}1\) then \[ \eta_t = \beta_0 + u_t = (\beta_0+k) + (u_t-k) = \beta_0^* + u_t^* \] so parameters are not well-defined

Solution:

  • Sum to zero constraint \(\sum_{i = 1}^n u_i = 0\)
  • This is be default included in the model

RW as intrinsic models

cmp1 = ~ Intercept(1) + time(year, model = "rw1", constr = TRUE)
cmp2 = ~ Intercept(1) + time(year, model = "rw1", constr = FALSE)
lik = bru_obs(formula = Erie~.,
              data = lakes)

fit1 = bru(cmp1,lik)
fit2 = bru(cmp2,lik)
[1] "FIT1 - Intercept"
             mean    sd 0.025quant 0.5quant 0.975quant    mode kld
Intercept 174.138 0.002    174.134  174.138    174.142 174.138   0
[1] "FIT2 - Intercept"
          mean     sd 0.025quant 0.5quant 0.975quant mode kld
Intercept    0 31.623    -62.009        0     62.009    0   0
[1] "FIT1 - RW1 effect"
    ID   mean    sd 0.025quant 0.5quant 0.975quant   mode   kld
1 1918 -0.122 0.017     -0.156   -0.122     -0.083 -0.122 0.000
2 1919  0.038 0.018     -0.005    0.040      0.070  0.042 0.001
[1] "FIT2 - RW1 effect"
    ID    mean     sd 0.025quant 0.5quant 0.975quant    mode kld
1 1918 174.017 31.623    112.008  174.017    236.025 174.017   0
2 1919 174.176 31.623    112.168  174.176    236.185 174.176   0

Random walks of order 2

  • Just like RW1, but now we consider 4 neighbours instead of 2

\[ u_t = \text{mean}(u_{t-2} ,u_{t-1} , u_{t+1}, u_{t+2} ) + \text{some Gaussian error with precision } \tau \]

  • RW2 are smoother than RW1

  • The precision has the same role as for RW1

Example

cmp1 = ~ Intercept(1) + 
  time(year, model = "rw1", 
       scale.model = T,
       hyper = list(prec = 
                      list(prior = "pc.prec",
                           param = c(0.3,0.5))))

cmp2 = ~ Intercept(1) + 
  time(year, model = "rw2",
       scale.model = T,
       hyper = list(prec = 
                      list(prior = "pc.prec", 
                           param = c(0.3,0.5))))


lik = bru_obs(formula = Erie~ ., 
              data = lakes)

fit1 = bru(cmp1, lik)
fit2 = bru(cmp2, lik)

NOTE: the scale.model = TRUE option scales the \(\mathbf{Q}\) matrix so the precision parameter has the same interpretation in both models.

RW models as smoothers for covariates

  • RW models are discrete models

  • Covariates are often recorded as continuous values

  • The function inla.group() will bin covariate values into groups (default 25 groups)

inla.group(x, n = 25, method = c("cut", "quantile"))

  • Two ways to bin

    • cut (default) splits the data using equal length intervals
    • quantile uses equidistant quantiles in the probability space.

RW models as smoothers for covariates - Example

The data are derived from an anthropometric study of 892 females under 50 years in three Gambian villages in West Africa.

Age - Age of respondent (continuous)

triceps - Triceps skinfold thickness.

triceps$age_group = inla.group(triceps$age, n = 30)

Model fit and results

cmp = ~ Intercept(1) + cov(age_group, model = "rw2", scale.model =T)
lik = bru_obs(formula = triceps ~.,
              data = triceps)

fit = bru(cmp, lik)
pred = predict(fit, triceps, ~ Intercept + cov)

Summary RW (1 and 2) models

  • Latent effects suitable for smoothing and modeling temporal data.

  • One hyperparameter: the precision \(\tau\)

    • Use PC prior for \(\tau\)
  • It is an intrinsic model

    • The precision matrix \(\mathbf{Q}\) is rank deficient

    • A sum-to-zero constraint is added to make the model identifiable!

  • RW2 models are smoother than RW1

Auto Regressive Models of order 1 (AR1)

Definition

\[ u_t = \phi u_{t-i} + \epsilon_t; \qquad \phi\in(-1,1), \ \epsilon_t\sim\mathcal{N}(0,\tau^{-1}) \]

\[ \pi(\mathbf{u}|\tau)\propto\exp\left(-\frac{\tau}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u}\right) \]

with

\[ \mathbf{Q} = \begin{bmatrix} 1 & -\phi & & & & \\ -\phi & (1+\phi^2) & -\phi & & & \\ & & \ddots & \ddots & \ddots & \\ & & & -\phi & (1+\phi^2) & -\phi \\ & & & & -\phi & 1 \end{bmatrix} \]

AR1: Hyperparameters and prior

The AR1 model has two parameters

  • The precision \(\tau\)
  • The autocorrelation (or persistence) parameter \(\phi\in(0,1)\)

AR1: Hyperparameters and prior

The AR1 model has two parameters

  • The precision \(\tau\)

    • PC prior as before - Baseline \(\tau=0\) pc.prec \[ \text{Prob}(\sigma > u) = \alpha \]
  • The autocorrelation (or persistence) parameter $(-1,1)

    • Two choices of PC priors
  1. Baseline \(\phi = 0\) pc.cor0

\[ \begin{eqnarray} \text{Prob}(|\rho| > u) = \alpha;\\ -1<u<1;\ 0<\alpha<1 \end{eqnarray} \]

AR1: Hyperparameters and prior

The AR1 model has two parameters

  • The precision \(\tau\)

    • PC prior as before - Baseline \(\tau=0\) pc.prec \[ \text{Prob}(\sigma > u) = \alpha \]
  • The autocorrelation (or persistence) parameter $(-1,1)

    • Two choices of PC priors
  1. Baseline \(\phi = 1\) pc.cor1

\[ \begin{eqnarray} \text{Prob}(\rho > u) = \alpha;&\\ -1<u<1;\qquad &\sqrt{\frac{1-u}{2}}<\alpha<1 \end{eqnarray} \]

Example - AR1 and RW1 for earthquakes data

The Model

\[ \begin{aligned} y_t|\eta_t & \sim \text{Poisson}(\exp(\eta_t))\\ \eta_t & = \beta_0 + u_t\\ 1.\ u_t&\sim \text{RW}1(\tau)\\ 2.\ u_t&\sim \text{AR}1(\tau, \phi)\\ \end{aligned} \]

Number of serious earthquakes per year

hyper = list(prec = list(prior = "pc.prec", param = c(1,0.5)))
cmp1 = ~ Intercept(1) + time(year, model = "rw1", scale.model = T,
                             hyper = hyper)
cmp2 = ~ Intercept(1) + time(year, model = "ar1",
                             hyper = hyper)


lik = bru_obs(formula = quakes ~ .,
              family = "poisson",
              data = df)

fit1 = bru(cmp1, lik)
fit2 = bru(cmp2, lik)

Example - RW1 and AR1

Estimated trend

Example - RW1 and AR1

Predictions

AR1 vs RW models

Continuous time modelling